- Main Datasets (w/ hospitalised data)
- An Exploratory data analysis of the US dataset.
- Validate data types and data integrity of each row.
- Graphical Exploratory Analysis.
- Analysis of Hospitalizations by State.
- Red data plots are Republican Governed States. Blue data plots are Democratic Governed States.
- Alabama
- Arizona
- Arkansas
- California
- Colorado
- Connecticut
- Delaware
- Florida
- Georgia
- Hawaii
- Idaho
- Iowa
- Kansas
- Kentucky
- Louisiana
- Maine
- Maryland
- Massachusetts
- Michigan
- Minnesota
- Mississippi
- Missouri
- Montana
- Nebraska
- Nevada:
- New Hampshire
- New Jersey
- New Mexico
- New York
- North Carolina
- Ohio
- Oklahoma
- Oregon
- Pennsylvania
- Rhode Island
- South Carolina
- South Dakota
- Tennessee
- Texas
- Utah
- Vermont
- Virginia
- Washington
- West Virginia
- Wisconsin
- Wyoming
- Assessing Correlation of Independent Variables.
- Build Model for Dependent Variables
Main Datasets (w/ hospitalised data)
Source: https://covidtracking.com/ Source: https://github.com/CSSEGISandData/COVID-19 Various state data, third party data, and various federal data
# see what filtered main dataframe looks like for all 50 states:
all_cases.head(50)
#Add state level data, beds, beds/1k, population, abbreviation, and name:
all_cases.head(50)
- Load and clean JHU data
- Merge JHU dataset with main dataset
#Load the Johns Hopkins data
jhu_df.tail(50)
#Grab all historical data and ensure we have the 1st US case.
all_cases.tail()
#We check the data type are correct above and review our combined, cleaned, validated, and merged data set for all 50 states:
covid_df.head(50)
The NaN values may indicate that there were no to few Covid-19 patients at these date points. We further analyse the statistical values of the dataset columns to ensure data integrity and accuracy.
#Validte the data with; mean, standard deviation, min/max quartiles:
covid_df.describe()
# TODO rounding up the numbers
#final_100k_last_month.head()
#Review the out for per capita measures:
final_100k_last_month.describe()
#Validate all US data:
timeseries_usa_df.tail()
# TODO fix legend/axis/plot alltogether
# Timeseries plt
fig, ax = plt.subplots(figsize = (16, 12))
plt.plot(fl.date, fl.positiveTestsViral, linewidth=4.7, color='r')
plt.title('Cummulative Number of Positive Viral Tests in Florida', fontsize=23)
plt.xlabel('Date')
plt.ylabel('No. Patients')